A prospective multi-center study quantifying visual inattention in delirium using generative models of the visual processing stream

The visual attentional deficits in delirium are poorly characterized. Studies have highlighted neuro-anatomical abnormalities in the visual processing stream but fail at quantifying these abnormalities at a functional level. To identify these deficits, we undertook a multi-center eye-tracking study where we recorded 210 sessions from 42 patients using a novel eye-tracking system that was made specifically for free-viewing in the (ICU); each session lasted 10 min and was labeled with the delirium status of the patient using the Confusion Assessment Method in ICU (CAM-ICU). To analyze this data, we formulate the task of visual attention as a hierarchical generative process that yields a probabilistic distribution of the location of the next fixation. This distribution can then be compared to the measured patient fixation producing a correctness score which is tallied compared across delirium status. This analysis demonstrated that the visual processing system of patients suffering from delirium is functionally restricted to a statistically significant degree. This is the first study to explore the potential mechanisms underpinning visual inattention in delirium and suggests a new target of future research into a disease process that affects one in four hospitalized patients with severe short and long-term consequences.

The visual processing deficits underlying visual inattention in delirium remain poorly understood despite visual inattention forming a crucial part of the definition of delirium 1 .Identifying these deficits is crucial for the understanding of the pathogenesis of delirium and for the development of targeted therapies in a syndrome that affects one in four hospitalized individuals with its associated mortality, morbidity, and long-term cognitive sequelae [2][3][4][5][6][7] .Radiological investigations, including Magnetic Resonance Image (MRI) and Computed Tamography (CT) Positron Emission Tomography (CT-PET), have revealed evidence of functional disconnectivity between the occipital lobe, specifically the ventral visual processing areas, and the anterior cingulate cortex, a region crucial for memory and attention regulation [8][9][10] .Similarly, Electroencephalogram (EEG) studies have demonstrated the diagnostic specificity of electrodes located on the occipital lobe, implicating altered neuronal responses in visual processing as a potential core feature of delirium 5,6 .However, despite these valuable insights, current Radiological and EEG approaches fall short of providing a comprehensive quantification of the functional errors in visual processing that occur in patients with delirium.
The definition of attention in the delirium literature does not draw distinction on the type of attention.and we therefore chose to focus on visual attention as a physically measurable signal that manifests from internal cognitive functions 11,12 .Broadly, cognitive attention refers to the capacity to maintain focus on a particular mental task or thought, involving the brain's executive control systems that manage sensory inputs and coordinate responses 13 .This form of attention is integral to memory, decision-making, and problem-solving.A subset of this is visual attention; this type of attention involves mechanisms for detecting and responding to visual elements within the environment, which are intrinsicly linked to higher cognitive processes 14 .In the context of delirium, quantifying and understanding visual attention is is pivotal to progress the understanding of the specific pathways and mechanisms affected by delirium.
To examine and quantify the visual processing that patients with delirium exhibit that in could result in visual attention deficits, we deplyed a novel camera-based eye-tracking device prospectively across two Intensive Care Units (ICUs) general district hospitals.We recorded 262 eye-tracking sessions across 42 patients (ClinicalTrials.gov-NCT04589169) where the patients were free to view the scene in front of them without specific stimulus.Each recording was labeled with the patient's delirium status using Confusion Assessment Method in ICU (CAM-ICU) 12 .
We then developed a novel neurocomputational generative model capable of simulating and comparing functional units of the ventral visual processing stream.This model leverages the functional hierarchy of visual processing to emulate neural processes given the scene that is being viewed and an internal cognitive task.This enables the creation of a dynamic baseline for visual attention.These models simulate the typical visual processing pathway, dynamically adapting to changes in the visual scene and the viewer's fixation and subsequently, their attention.By modeling the normal hierarchical processes of visual attention-from basic visual cue detection in early units to complex object and face recognition in later stages-we establish what normal attentional engagement should look like across different visual tasks and environments.This baseline model can then facilitate comparisons between different patients by constraining the comparison to that baseline-our generative model.
We validate our generative model in two ways, firstly under simulation to understand its characteristics, and secondary in an existing dataset of participants undergoing a visual search task.We then apply the model to the eye-tracking data collected from patients in Adult Intensive Care Unit (AICU) to quantify the functional errors in visual processing that occur in patients with delirium.
The data and the analysis technique suggest that delirium is punctuated by functionally restricted processing throughout the visual processing hierarchy.This finding quantifies this restriction and posits that a potential mechanism that underlies visual in-attention could be at the level of visual encoding prior to working memory, as previously thought.

Method
The reported study was approved by the NHS Health Research Authority (HRA) and Research & Ethics Committee (REC) (20/LO/0162) and registered on ClinicalTrials.gov(NCT04589169 15 ).As delirium is a capacitylosing state, informed consent was sought from a next of kin or a nominated consultee should the patient suffer from delirium.Patients were recruited between November 2020 and February 2022 from two interdisciplinary general medical and surgical hospitals.Participants were not remunerated for their participation in the study.All research methodology was conducted in accordance with the Declaration of Helsinki.

Participants
A total of 42 patients were recruited who had a total risk-adjusted probability of developing delirium greater or equal to 20% as per the Early PREdiction of DELIRium in ICu patients (E-PRE-DELIRIC) score 16 .Exclusion criteria were limited to the inability to measure eye movements appropriately or gain ground truth annotation of delirium.Delirium was diagnosed by fluctuating alteration in cognition and memory formalized through CAM-ICU which was performed by an Intensive Care clinician contemporaneously with the measurements using the apparatus 12 .Participants were recruited from ICUs from Chelsea & Westminster Hospital (CWH) and West Middelsex Hospital (WMH) and NHS approvals, two district hospitals recieving general medical and surgical patients.Briefly, were sequentially recruited into the study as they were admitted to ICU and they met the eligbility criteria; Supplementary Information A details the recruitment process.Table 1 lists the characteristics of the patients enrolled in the study.

Eye movements & fixation measurement apparatus
A dual camera system was developed and validated for the exact purpose of eye movement measurement in patients in ICU in a clinically safe, accurate, and calibration-free manner 15,17 .Briefly, one camera is placed facing the patient, at a distance that does not interfere with clinical duties, and continuously regresses the gaze vector through an ensemble of neural networks.These gaze vectors were converted to fixations using a dispersion filter-the sequence of gaze vectors is classified as a fixation if their dispersion falls below 6 • and lasted between 300 and 600 ms 18,19 .Another, depth enabled, camera is then located behind the patient and looks at the scene with the same field of view as the participant.This camera is used to intersect the fixation vector with the scene revealing the pixel location of the patient's gaze 20 .The apparatus is placed for 10 min daily until discharge, with concurrent CAM-ICU measurement.The participants were not asked to perform any specific task while the gaze measurements were taken using the apparatus.Fig. 1 illustrates the apparatus.

Generative model overview
As the task of comparing eye movements from recordings of patients undergoing natural stimuli is unconstrained, we sought to create a generative baseline that can serve as a comparator.We empirically chose to create a neuro-computational baseline where we define the task of fixation prediction as a composition of a top-down, task-dependent signal, a temporal short-term memory component, and a bottom-up, scene-dependent signal [21][22][23][24][25] .This is formalized into a model that can generate a distribution over the set of future fixations: where ⊙ is the element-wise product, P(f t+1 |f t , s t ) is the probability distribution of the next fixation f t+1 condi- tional on the current fixation f and the scene s at time t.Task-oriented saccade biases, P tosb , encode a joint prob- ability of saccadic amplitude and direction which are known to be task dependent 26 .P ior represents the probability (1) of returning to a fixated location within a short period and functionally represents short-term memory, facilitating the exploration of the scene without return within T fixations: where represents a Gaussian distribution parameterized by a standard deviation set to 2 • reflecting the width of the human fovea, i is the inhibition Gaussian defined as �(�y − f t � 2 ) and r is the recovery Gaussian defined as �(�y − f t−k � 2 .y ∈ I represents all locations in image I . . 2 is the Euclidean distance, ⌊.⌋ clips the values between [0, 1].ξ is the number of fixations to consider and is equal to min(T, t − 1) where T is nominally set to 8 24 .This function is effectively discouraging return of the fixation to the same location within a short period, with a decaying profile.
Bottom-up fixations are modeled using a saliency map where P b is the probability of a fixation given the intrinsic parameters of that saliency model of the scene.We chose saliency models that represent the hierarchy of the visual processing stream starting from simple geometric shapes and increasing in complexity up to objectlevel saliency.
The combination of P b , P ior , and P tosb represent one model m from a set of models M each generating a distribution of the next fixation.We then compose multiple models, each with a different intrinsic notion of task-oriented saccade biases, inhibition of return, and saliency.The correctness of each model can then be calculated by sampling from the resulting distribution over the predicted fixation ( P m ) using the distance to the actual next fixation as the metric: where E is the expectation operator that samples n from the probability distribution for model P m .� * � 2 is the ℓ 2 norm.To normalize these measurements, the distances are normalized to the maximum possible value-the diagonal of the scene image I .This results in a distance metric where smaller distances are a result of more accu- rate predictions for that model.As smaller distances between the simulated fixation and the measured fixation, the following softmax function is used: Illustration of the camera setup and how gaze vectors were extracted to be converted to fixations.The head camera is placed facing the patient and is used to regress the gaze vector through an ensemble of neural networks.The scene camera is placed behind the patient and is used to intersect the gaze vector with the scene revealing the pixel location of the patient's gaze, termed the Point of Regard (PoR).The data generated is then stored, labeled with the patient's delirium status, and analyzed.
where d m is the current model's distance, and |M| is the cardinality of the set of all models.Parameter β dictates the sharpness of the softmax function encouraging separation between models.This results in a normalized distribution where the model that is most in line with measured fixation ( fix t+1 ) has the highest probability.Each model m then competes for attribution in a winner-takes-all scheme where the model with the highest probability ( σ m ), and thus the lowest error, is rewarded a token: Further formalization is in Supplementary Information B and Fig. 2 illustrates the set of generative models.
To mirror the functional hierarchy of the visual processing stream, we parameterized the saliency maps ( P b ) within each model m with the hierarchical structure of the visual processing stream in humans.Early visual units encode low levels of saliency, such as edges, and contrast.Mid-level cortical centers encode proto-objects such as boxes, and cylinders, whereas later cortical centers encode object identifiers, motion, and faces.Thus, set M is a hierarchical set of models of the visual processing stream where lower-level models are presented in lower elements in the set ( m i < m i+1 ∀m ∈ M ).Task-oriented saccade biases and inhibition of return are shared across all models in M as are thought to originate from higher cortical centers 24 . (4)

Validation and evaluation
To validate the generative model used for the analysis of the eye fixation data, we sought to understand its characteristics firstly in simulation, and then evaluate its performance on an external dataset where the internal cognitive state is known.The simulation facilitates the probing of the generative models when both the bottom-up influences (i.e., the scene) and top-down influences (i.e., task-oriented saccade biases, and inhibition of return) are controlled for.This also facilitated the tuning of hyper-parameters β and T.
The external dataset, one of visual search, facilitated the control of the scene, and hence bottom-up influences, through the explicit presentation of a stimulus whilst the eye patterns, and hence top-down influences, were left free.This facilitated the assessment of the generative models' ability to recall ground truth cognitive internal states.

Results
Following ethical and regulatory approvals, 42 patients were recruited culminating in 210 recordings; their characteristics are listed in Table 1.Recruited patients underwent a once-daily assessment of delirium through CAM-ICU 12 with concurrent measurements of eye movements and the fixation on the scene, for 10 min daily until discharge from ICU.The data gathered was then used to infer the internal cognitive state of the patient through the generative model, dichotomized by the delirium status of the patient.

Model with the highest reward reproduced the ground truth
To characterize the generative models, we first conducted a simulation exercise where we simultaneously created the scene and created the fixations and probed the general models for their robustness to noise and rapidity of change detection.This characterization also served to find the optimal value for hyper-parameter β .Supple- mentary Information C outlines the simulator and Supplementary Figures S4a and 4b illustrates the result of the simulation.A test of normality of the P m demonstrated a highly non-normal (Kolmogorov-Smirnov test; D = 0.635 ) cementing the requirement for sampling from the predictive distribution as per Eq. 3. www.nature.com/scientificreports/Two simulations were conducted, a constant simulation to identify model separation and a switch task for change detection.Model counts were significantly different from the simulated fixated environment and simulated, non-fixated, environment ( χ 2 = 190, p < 0.001, df = 1 ).There was no delay between the simulated fixation switch and the model's response ( t(48) = 4.8, p < 0.001 ).This finding was consistent regardless of the fixation's start position ( p = 0.98 ) and irrespective of the location of the target ( p = 0.87)

Visual search task unveiled by the model with the highest reward
We next sought to demonstrate the generative models' ability to generate plausible fixations under a fixed internal cognitive task.We used a pre-existing dataset from 26 participants undergoing a visual search task 27 .
The generative models were arranged hierarchically representing the functional connectivity of the processing stream of the participants with the addition of a center bias model due to the participants having fixed head positions.The model with the highest reward correctly identified the moment that the participant fixated on the visual search target ( χ 2 = 190, p < 0.001, df = 4 ).The model with the highest reward at time t − 1 was also predictive of future fixation f and time t (ANOVA; F = 6.8, p < 0.001).

Visual processing in acute delirium
Given our findings that the generative model set M is accurate in simulation, recalling visual attention behavior, and is predictive of future fixations, we analyzed the data gathered from patients in ICU undergoing task-free natural viewing.
Table 2 demonstrates the results of this analysis.We tallied the reward of each model which originates from a fixation originating from a recording that was labelled with the delirium status of the recording.The table demonstrates a statistically significant difference between the rewards of the models dichotomized by delirium status ( χ 2 , p < 2.2 × 10 −16 for all models).This suggests that visual attention is markedly reduced between patients who have suffer from delirium compared to those who have not.Effect size is most striking for units in the early and middle of the hierarchy but is also present in late and facial units.
More specifically, the effect sizes are more substantial in the early and middle visual processing units.This finding suggests that the primary visual processing stages, which are responsible for the detection and basic interpretation of visual cues, are impaired in delirious patients.Additionally, similar disruptions are present in the middle units, which deal with the processing of proto-objects, indicating that the integration of visual information is also compromised.
Furthermore, the analysis also reveals differences in the late and facial recognition units.This indicates that not only are the basic and intermediate visual processing capabilities affected, but there is also a considerable impact on the higher-order visual processing abilities, which include the integration and interpretation of complex visual scenes and facial recognition.

Discussion
Delirium is marked by acute disruptions in attention.Visual attention, a quantifiable and measurable subtype of attention, involves the selective processing of visual stimuli which we measured through eye-tracking using a custom eye-tracking apparatus that is suitable for use in critical care.This study focused on visual attention to understand the attentional impairments in delirium.
Altogether, our findings suggest a functional reduction in the working capacity of the ventral visual stream of patients suffering from delirium.This reduction is marked across the entire hierarchy of the visual stream including specialist areas such as the facial recognition unit.
Prior work on the understanding of the internal mechanisms highlighted that visual in-attention is a hallmark of acute delirium.Radiological investigations (CT, CT-PET, and MRI) have been heterogeneous but have broadly demonstrated reduced functional dysconnectivity between sensory processing and memory encoding units.These findings have been backed by EEG studies that suggest a reduction in high-order cortical activity in delirium; most importantly, the biggest discriminatory power for EEG seems to arise by the comparison of posterior and anterior electrodes.However, both radiological and electrophysiological techniques have failed at identifying the dysfunction of those areas and have to be inferred from the structure and counter function of studies of healthy individuals.Biochemical analysis of cerebrospinal fluid from patients suffering from delirium highlighted increased metabolic pressure suggesting restricted function at a neuronal level.

Table 2.
Results of the analysis of the generative models dichotomized by delirium status.Statistical significance testing using the χ 2 test between the tallied probabilities of delirium vs. non-delirium with the null hypothesis being of no difference demonstrates that there is a statistically significant difference across all functional visual processing modules between patients with and without delirium.To understand the functional issues underlying visual attention in delirium, we reformulated the task of visual attention to fixation prediction and decomposed that task to account for bottom-up and top-down influences.This formulation led to a generative model that generates future fixation predictions that are conditional on past fixations and the scene.This facilitated the generation of multiple plausible fixations under different assumptions and the measurement of which generating process is most closely in line with viewing behavior.We initially validated the generative models in increasingly complex simulations to ascertain their properties.We then went further and validated them using a pre-existing dataset of visual search where we demonstrated accurate inference of the internal cognitive task to a highly statistically significant degree.
Lastly, and in answering our original question, we gathered data from 262 recordings from 42 patients in an ICU setting across two hospitals.In this experiment, we modeled visual attention using a similar computational hierarchy of the human visual processing cortex.We demonstrate that visual processing throughout the ventral visual stream has functionally reduced capacity compared to patients not suffering from delirium.The uniformity in the statistical significance across these different units of visual processing underscores the widespread nature of the impact that delirium has on the visual attention system.These results provide a quantitative basis for understanding the severity and extent of visual processing disruptions in delirious patients, restricted not only to the initial stages of visual attention but extending throughout the visual processing hierarchy.
This finding suggests that visual encoding and visual processing may play a role in the development of delirium, potentially preceding cognitive impairments.It provides valuable insights into the disease process and opens avenues for further research into the role of the ventral visual stream in acute delirium.Additionally, this finding has potential clinical implications, as interventions aimed at modifying the course of delirium can now consider the impact on visual processing, which can serve as a meaningful indicator of intervention effectiveness.

Limitations and future work
While we have demonstrated the utility of our generative models in understanding the functional impairments in visual processing in delirium, there are several limitations to our study.Firstly, the generative models were informed by pre-existing data on the visual processing in humans, and thus the set of models ( M ) was restricted to a set of 4 models.A larger set of models would represent a more granular inspection of the visual processing hierarchy.Secondly, the hyper-parameters for each model m, although informed by validation on an external dataset, could be markedly different in patients suffering from delirium and thus represent a form of possible distributional shift.Future work should focus on expanding the set of generative models to include more granular models of the visual processing hierarchy and to investigate the impact of distributional shifts on the performance of the generative models.Additionally, future work should focus on investigating the impact of different hyperparameters on the performance of the generative models in patients suffering from delirium.Also, while task-oriented saccade biases' distributions have a myriad of shapes, we chose to model them as a minimum entropy distribution.Future work should focus on investigating the impact of different distributions on the performance of the generative models in patients suffering from delirium.This can be done by using eye-movement data from recordings of patients with delirium in this study.Lastly, a set of those saccade biases can be incorporated in the analysis in a combinatorial fashion but would require a more sophisticated model selection algorithm and a larger sample size.Lastly, we can also use a data-driven approach to jointly learn the structure and parameters of the simulated visual processing pipeline, which would make introspection even more impactful.

Conclusion
The ventral visual processing stream of patients suffering from acute delirium is functionally restricted compared to patients who are not delirious.This implicates visual encoding as an upstream process prior to cognitive processing in the development of delirium.Further work is required to phenotype the deficits across sub-populations suffering from inflammatory, toxic, or hypoxic delirium.

Figure 2 .
Figure 2.Illustration of the generative model, composed of a hierarchical set of models, that output a probabilistic map of fixation locations that can then be compared to patient fixations to infer the function of the visual processing stream.Each model is composed of saliency, task-oriented saccade biases, and inhibition of return and they jointly result in a probabilistic map of the future fixation.The error of this prediction, compared to the actual measurement ( f t+1 ) can then be calculated, and normalized, and the model with the highest probability wins a token.These are then tallied up and compared.Saliency is calculated from the scene information which forms bottom-up signal propagation whilst the inhibition of return and saccade biases model top-down influences on eye-movement behavior.

Table 1 .
Characteristics of patients enrolled as part of the deployment of the eye-tracking platform forming a clinical feasibility study.Pertinent confounding variables related to outcomes of interests as well as variables that predict the development of delirium are presented across the two hospitals; Chelsea & Westminster Hospital (CWH) and West Middelsex Hospital (WMH).